Induction of Modular Classification Rules: Using Jmax-pruning
نویسندگان
چکیده
The Prism family of algorithms induces modular classification rules which, in contrast to decision tree induction algorithms, do not necessarily fit together into a decision tree structure. Classifiers induced by Prism algorithms achieve a comparable accuracy compared with decision trees and in some cases even outperform decision trees. Both kinds of algorithms tend to overfit on large and noisy datasets and this has led to the development of pruning methods. Pruning methods use various metrics to truncate decision trees or to eliminate whole rules or single rule terms from a Prism rule set. For decision trees many pre-pruning and postpruning methods exist, however for Prism algorithms only one pre-pruning method has been developed, J-pruning. Recent work with Prism algorithms examined Jpruning in the context of very large datasets and found that the current method does not use its full potential. This paper revisits the J-pruning method for the Prism family of algorithms and develops a new pruning method Jmax-pruning, discusses it in theoretical terms and evaluates it empirically.
منابع مشابه
J-measure Based Hybrid Pruning for Complexity Reduction in Classification Rules
Prism is a modular classification rule generation method based on the ‘separate and conquer’ approach that is alternative to the rule induction approach using decision trees also known as ‘divide and conquer’. Prism often achieves a similar level of classification accuracy compared with decision trees, but tends to produce a more compact noise tolerant set of classification rules. As with other...
متن کاملJmax-pruning: A facility for the information theoretic pruning of modular classification rules
The Prism family of algorithms induces modular classification rules in contrast to the Top Down Induction of Decision Trees (TDIDT) approach which induces classification rules in the intermediate form of a tree structure. Both approaches achieve a comparable classification accuracy. However in some cases Prism outperforms TDIDT. For both approaches pre-pruning facilities have been developed in ...
متن کاملUsing J-Pruning to Reduce Overfitting of Classification Rules in Noisy Domains
The automatic induction of classification rules from examples is an important technique used in data mining. One of the problems encountered is the overfitting of rules to training data. This paper describes a means of reducing overfitting known as J-pruning, based on the J-measure, an information theoretic means of quantifying the information content of a rule, and examines its effectiveness i...
متن کاملAn Information-Theoretic Approach to the Pre-pruning of Classification Rules
The automatic induction of classification rules from examples is an important technique used in data mining. One of the problems encountered is the overfitting of rules to training data. In some cases this can lead to an excessively large number of rules, many of which have very little predictive value for unseen data. This paper is concerned with the reduction of overfitting. It introduces a t...
متن کاملInduction of Landtype Classification Rules from GIS Data
The feasibility of inducing classification rules for Landtype Associations (LTAs) from instances of human-expert classifications was tested by evaluating the accuracy of 3 rule-induction algorithms on data drawn from a GIS coverage of Southeast Wyoming. In 10-fold cross-validation tests, the accuracy of rules using precipitation, vegetation, geology, elevation, slope, and aspect as features ach...
متن کامل